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FIGURE 1 



AGTCCCAGACGGGCTTTTCCCAGAGAGCTAAAAGAGAAGGGCCAGAGAATGTCGTCCCAG 
5 CCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACTCCTATGGCAGCTGGTAC 
ATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGGAAGTGGCCTCCTGCCAC 
ACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGCTGTCAATCCTTGTGCTG 
CTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTGACTGTGTGCG'TGGCAGG 
CCCGGCCTGCCCAGCCCTGTGGATTTCTTGGCTGGGGACAGGCCCCGGGCAGTGCCTGCT 
1 0 GCTGTTTTCATGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTG 
CCCTTCCTGACTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCC7AAGAGGG 
GCCTGGAAGATACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGT 
GCCACGGCTGGCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTT 
GGGGTCCAGGTCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTAC 

1 5 TCCCTGCTGGCCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCT 

GTGCAGCTGGTGAGAAGCTTCAGCGGTAGGACAGGAGCAGGCTCGAAGGGGCTGCAGAGC 
AGCTACTCTGAGGAATATCTGAGGAACCTGCTTTGCAGGAAGAAGCTGGGAAGCAGCTAC 
CACACCTCCAAGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTAC 
ACTCCACAGCCAGGATTCCATCTCCCGCTG7VAGCTGGTGCTTTCAGCTACACTGACAGGG 

2 0 ACGGCCATTTACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAG 

GTGAGGGCAGGGGTCACCACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTC 
TCCGAGGACAAGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTG 
TGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGGGCTCA 
CTGGTGACACACAGGACCAACCTTCGAGCTCTGCACGGAGGAGCTGCCCTGGACTTGAGT 

2 5 CCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGT 

GCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTG 
GGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTG 
GTCTTCCGTTCCCTGGAGTCCTCGTGGGCCTTCTGGCTGACTTTGGCCCTGGCTGTGATC 
CTGCAGAACATGGGAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTG 

3 0 ACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGGTGGTG 

GGTGGCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGGCATCCACCTT 
GGCCAGATGGACCTGAGCCTGCTGCCACCGAGAGGCGCCACTCTCGACCCCGGCTAGTAC 
AGGTACCGAAACTTCTTGAAGATTGAAGTCAGGCAGTGGCATCCAGCCATGACAGCCTTC 
TGCTCCCTGCTCCTGCAAGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGAC 

3 5 AGCCTCAGACCAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATG 

GCGAAGGGAGCTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACG 
CTGCTGCACAACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGT 
GCCCAGCCCTGAGGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCC 
TGCCTACCATCCTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCA 

4 0 GCAGGTCCTCCGGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAG 

GGCTCTGCTCCACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAAAAACTG 
GTGGGTTAGGGCCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTC 
CCTACCCTGGCTCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACT 
CCAGCCCAGCTCCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCT 
4 5 CACCCCCTCAGCGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGTCCTCTGGC 
CTGCAGGGCAGCCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGA 
GAGCCAGATATTTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTC 
CCTGCAATAAACTTGTTCCTGAGAAAAAAAAAAAAAAAAAA^^ 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
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WSSOPAGNOTSPGATEDysyGSWYIDEPOGGEEbQPEGEVPSCHTSIPPGLYHACbASLS 
ILVbbL^LVKRRQLWPDCVRGRPGbPSPVDFlAGDRPRAVPAAVFMVLI^SLCLbLPD 
EDALPFLTLASAPSQDGKTEAPRGAWKILGLFYY/VALYYPLAACATAGHTAAHbLGSTLS 
5 WAHbGVOW?QRAECPOVPKIYKYYSl>LASLPbLLGLGFLSLWYPVQLVRSFSRRTGAGSK 
GLQSSYSEEYLRNbLCRKKbGSSYHTSKHGFbSWARVCbRHCIYTPQPGFHLPLKiVLSA 

TbTGTA I Y QVALLLLVGWPTI QKVRAGVTTDV S Y LLAGFG I VLSEDKQEVVELVKHHLtW 
ALEVCYISAbVLSCLLTFbVbMRSLVTHRTNLRAbHRGAAbDbSPLHRSPHPSRQAIFCW 
MSFSAYOTAFICLGLbVQQIIFFLGTTALAFLVLMPVbHGRNLLLFRSLESSWPFWLTlA 
1 0 LAVI LQNrW^VA/FLETHDGHPQbTNRRVbYAATFLbFPLNVLVGAMVATWRVbbSALYW 
AIHLGQMDbSLbPPRAATLDPGYYTYRNFLKIEVSOSHPAMTAFCSLLLQAQSLLPRTMA 
APQDSLRPGEEDEGMOLLQTKDSMAKGARPGASRGRARWGIAYTLLHNPTLQVFRKTALL 

GAWGAQP 

Important features of the protein: 
15 Signal peptide: 

None 

Jr\ Transmembrane domain: 

5 20 

£3 54-69 

il 102-119 

fn 3 4 8-166 

11 207-222 

IZ 25 301-320 

sy 364-380 

*L 4 31-451 

W 4 74-489 

" =: l 560-535 

\t 30 

D Motif file: 

□ Motif name: N-glycosyl ation site. 



8-12 

Motif name: N - myr i s toy 1 at i on site. 



50-56 
176-182 

40 241-247 
317-323 
341-347 
525-531 
627-633 

45 631-637 
640-646 
661-667 



Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 
364-375 

Motif name: ATP/GTP- bi nding site motif A (P-loop) . 



55 



132-140 
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FIGURE 3A 

PRO XXXXXXXXXXXXXXX (Length = 15 amino acids) 

Comparison Protein XXXXXYYYYYYY (Length = 12 amino acids) 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALiGN-2) divided by (the loial number of amino acjd residues 
of the PRO polypeptide) = 



5 divided by 15 = 33.3% 




FIGURE 3B 

PRO XXXXXXXXXX (Length - 10 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length - 15 amino acids) 

5 

% amino acid sequence identity = 



(the number of identically matching amino acid residues between the two polypeptide 

O Oril >^t^/-^o ^ o ^rvfDrmim-vrl Kir Al I M 0\ A \ A. ,A K„ / tU ^ * , . ♦ ^ 1 . I- - ~f .1 — „ I I I J 

o^juLiiLto oo vj^ iv i u m iv_ vi \; y f\ i_ tvj i ^ >c ; ujvjucu Ujr V u,c tuuil JIUJUUC1 t/J tUJJJLJJU dLIU lCbJUUCS 

10 of the PRO polypeptide) = 



5 divided by 10 = 50% 



FIGURE 3C 



PRO-DNA NNNNNNNNNNNNNN (Length - 14 

nucleotides) 

Comparison DNA NNNNNNLLLLLLLLLL (Length = 16 

nucleotides) 

% nucleic acid sequence identity — 

(the number of identically matching nucleotides between the (wo nucleic acid sequences as 
determined by AL1GN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) = 



6 divided by 14 = 42.9% 



FIGURE 3D 



PRO-DNA 



Comparison DNA 



NNNNNNNNNNNN 



NNNNLLLVV 



(Length = 12 nucleotides) 

(Length = 9 nucleotides) 



5 

% nucleic acid sequence identity — 

(the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA 
10 nucleic acid sequence) = 

4 divided by 12 - 33 3% 
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XT /l*>f>Tl*> 
77 UV 1 (JIT: 


i n 

j. \J 






] fit 




/ Pi 




/ * A * / 




/* B */ 


1 S 


/* C */ 




/* D */ 




/* E */ 




/* F */ 




/* G V 


2 0 


/* H */ 




/* 1 */ 




/* J */ 




/* K V 




/* L V 




/* M */ 




/* N */ 




/* O */ 




0, M, M, 




/* P V 


3 0 


/* Q V 




/* R V 




/* S V 




/* T */ 




/* U */ 


H 35 


/* v */ 




/* W *f 




f* X V 




/* Y */ 


a sis 


/* Z */ 


40 


}; 



* C-C increased from 12 to 15 

* Z is average of EQ 

* B is average of ND 

* match with stop is _M; stop- stop 
*/ 

M -8 



0; J (joker) match = 0 
/* value of a match with a stop */ 



day[26][26] - { 

BCDEFGH1JKLMNO 
2, 0,-2, 0, 0,-4, 1,1,-1, 0,-1,-2,-1, 0, 
0, 3,-4, 3, 2,-5, 0, 1,-2, 0, 0,-3,-2, 2, 
-2,-4,15,-5,-5,-4,-3,-3,-2, 0,-5,-6,-5,-4 
0, 3,-5, 4, 3,-6, 1, 1,-2, 0, 0,-4,-3, 2, 

0, 2,-5, 3, 4,-5, 0, 1,-2, 0, 0,-3,-2, 1," 
-4, 5,-4,-6,-5, 9,-5,-2, 1, 0,-5, 2, 0,-4," 

1. 0,-3, 1 . 0,-5, 5,-2, 3, 0,-2,-4,-3. 0, 
-1, 1,-3, I, 1,-2, 2, 6,-2, 0, 0,-2,-2, 2, 
-1,-2,-2,-2,-2, 1,-3,-2, 5, 0,-2, 2, 2,-2," 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_ 
- 1 , 0,-5, 0, 0,-5,-2. 0,-2, 0, 5,-3, 0, 1 , 
-2,-3,-6,-4,-3, 2,-4,-2, 2, 0,-3, 6, 4,-3." 
-1,-2,-5,-3,-2, 0,-3,-2, 2, 0, 0, 4, 6,-2,^ 

0, 2,-4, 2, 1,-4, 0, 2,-2, 0, 1,-3,-2. 2, 

M MM ,_M M ,_M ,M ,_M ,_M } , 

1, -1,-3, -J, -I, 5,-1, 0,-2, 0,-1,-3,-2,-1, 

0, 1,-5, 2, 2,-5,-1, 3,-2, 0, 1,-2.-1, I, 

2, 0,-4,-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0. 
1,0, 0, 0, 0.-3, 1,-1,-1, 0, 0,-3,-2, 1, 

1, 0,-2, 0, 0,-3, 0,-1, 0, 0, 0,-1,-1, 0," 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, " 
0,-2,-2,-2,-2,-1,-1,-2, 4, 0,-2, 2, 2,-2, 
6,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3,-2,-4,-4," 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_ 

3, -3, 0,-4,^1, 7,-5, 0,- J, 0,-4,-1,-2,-2, 
0, 1,-5, 2, 3,-5, 0, 2,-2, 0, 0,-2,-1, 1, 



PQRSTUVWXYZV 
M, 1, 0,-2, 1, 1,0, 0,-6, 0,-3, 0}, 
M, 1, 1, 0, 0, 0, 0,-2,-5, 0,-3, I}, 
M, -3, -5,-4, 0,-2, 0,-2,-8, 0, 0,-5}, 
M.-l, 2,-1, 0, 0, 0,-2,-7, 0,-4, 2}, 
M,-l, 2,-1, 0, 0, 0,-2,-7, 0,-4, 3}, 
~M,-5,-5,-4,-3,-3, 0,-1, 0, 0, 7,-5}, 
M, -1,-1, -3. 1,0. 0. 1,-7, 0,-5, 0}, 
_M, 0, 3, 2,-1,-1, 0,-2,-3, 0, 0, 2}, 
_M,-2,-2,-2,-l, 0, 0, 4,-5, 0,-1,-2}, 
M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
M,-l, 1,3,0, 0, 0,-2,-3, 0,-4, 0}, 
jvl, -3, -2, -3, -3,-1, 0, 2,-2, 0,-1,-2}, 
JV1, -2,-1, 0,-2,-1, 0, 2,-4, 0,-2,-1}, 
M,-l, 1,0, 1,0, 0,-2.-4. 0,-2, 1). 

{ M MM, M,_M,_M,_M,_M,_M, 



M, M, M, M, M, 



M, 6, 0,0, J , 0, 0,-1,-6. 0,-5, 0}, 
M, 0, 4. 1.-1,-1, 0,-2,-5, 0,-4, 3}, 
M, 0, I, 6, 0,-1, 0,-2, 2, 0,-4, 0}, 
~M, 1,-1, 0, 2, 1, 0,-1,-2, 0,-3, 0), 
M , 0,-1,-1, 1,3, 0, 0,-5, 0,-3, 0}, 
M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
M, -1,-2,-2,-1, 0, 0, 4,-6, 0,-2,-2} 
~M,-6,-5, 2.-2,-5, 0,-6,17, 0, 0,-6}, 
M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
M, -5, -4,-4,-3, -3, 0,-2, 0. 0,10,-4}, 
M, 0, 3, 0, 0, 0, 0,-2,-6, 0,-4, 4} 
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/* 
*/ 

include <stdio.h> 
include <ctype.h> 

^define MAXJMP 

^define MAXGAP 

^define JMPS 

^define MX 



^define 
^define 
^define 
^define 
^define 
^define 



DM AT 

DM1S 

D1NS0 

D1NS1 

PINSO 

PJNS1 



16 
24 
)024 
4 

3 
0 



/* max jumps in a diag */ 

/* don*! continue lo penalize gaps larger than this */ 
/* max jmps in an p3th */ 

/* save if there's at least MX- J bases since last jmp */ 

/* value of matching bases */ 

/* penalty for mismatched bases */ 

/* penalty for a gap */ 

/* penalty per base */ 

/* penalty for a gap */ 

/* penalty per residue */ 



struct jmp { 
2 0 short 

unsigned short 

}; 



n'MAXJMP]; /* size of jmp (neg for dely) V 
x[MAXJMP]; /* base no. of jmp in seq x */ 
/* limits seq to 2' 16 - 1 */ 







struct diag { 








25 


int 


score; 


/* score at last jmp */ 


■sss# 




long 


offset; 


/* offset of prev block V 






short 


nmp; 


/* current jmp index */ 






struct jmp jp; 


J* list of jmps */ 




30 












struct path { 










int 


spc; /* 


number of leading spaces */ 






short 


njJMPS);/* size of jmp (gap) */ 






int 


x[JMPSJ;/* loc of jmp (last elem before gap) V 




35 


}; 










char 


*ofi!e; 


/* output file name */ 






char 


*namex[2J; 


/* seq names: getseqs() */ 


a: :5S 




char 


*prog; 


/* prog name for err msgs */ 




40 


char 


*seqx|2J; 


/* seqs: getseqs{) */ 






int 


dmax; 


/* best diag: nw() */ 






int 


dmaxO; 


/* final diag */ 






int 


dna; 


/* set if dna; main() */ 






int 


endg3ps; 


/* set if penalizing end gaps */ 




45 


int 


gapx, gapy; 


/* total gaps in seqs */ 






int 


lertO, Icnl; 


/* seq lens */ 






int 


ngapx, ngapy; 


/* total size of gaps */ 






int 


smax; 


/* max score: nwO */ 






int 


*xbm; 


/* bitmap for matching */ 




50 


long 


offset; 


/* cujrent offset in jmp Hie */ 






struct diag 


*dx; 


/* holds diagonals */ 






struct path 


PPI2]; 


/* holds path for seqs */ 






char 


*calloc0, *malIoc(), * 


indexQ, *strcpyQ; 




55 


char 


*getseqQ, *g callocQ; 
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/* Needleman-Wunsch alignment program 



* usage: progs filel f)le2 

5 * where file) and Fde2 are two dna or two protein sequences. 

* The sequences can be in upper- or lower case an may contain ambiguity 

* Any lines beginning with ";\ ' > ' or * < ' are ignored 

* Max file length is 65535 (limited by unsigned short x m the jmp struct) 

* A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 
10 * Output is in the file "align.out" 

* The program may create a tmp file in /tmp to hold info about Ir3ceback. 

* Original version developed under BSD 4.3 on a vax 8650 
*/ 

15 ^include "nw.IT 
^include "day h" 

static _dbval[26) = { 

1,14,2,13,0,0,4,11,0,0,12,0,3,15,0,0,0,5,6,8,8,7,9,0,10,0 

20 }; 

static _pbval]26) - { 

1, 2 |(1 < <('D' *A'))i(l < <CN'-'A')), 4, 8, 16, 32, 64, 

128, 256, OxFFFFFFF, I< < 10, 1< < 11, 1< < 12, I< < 13, 1< < 14, 

2 5 1< < 15, 1< < 16, !< < 17, 1< < 18, 1< < )9, 1< < 20, 1< <21, 1< <22, 

1< <23, 1< <24, 1< <25|(l< <('E' , A'))|(1< <CQ'-'A')) 

}; 

main(ac, av) 

3 0 in! ac; 

char *av[); 

{ 

prog = av|0); 
if (ac ]= 3) { 

35 fprinif(stderr, "usage: %s File I Hle2\n T ' I prog); 

fprintf(stdeu, "where filel and file2 are two dna or two protein sequences An"); 
fprintf(stdej r^The sequences can be in upper- or lower-case\n"); 
fprint^stderr^Any lines beginning with ';* or ' < ' are ignored\n"); 
fprintf(stderr T "Output is in the file \"altgn out\"\rT); 

4 0 exrt(l); 

} 

namex[0) = av|l]; 
n3mex|l] = av!2]; 
seqx[0] = getseq(namex[0], &Ien0); 
AS seqx|l] - getseq(namexll], &lenl); 

xbrn = (dna)? dbval : j>bval; 



main 



endg3ps = 0; /* 1 to penalize endgaps */ 

ofile = "align.out"; /* output file */ 

nwO; fill in the matrix, get the possible jmps */ 

readjmpsO; '* gel the actual jmps */ 

printO; J* print stats, alignment */ 

55 cleanup(O); /* unlink any imp files */ 
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/* do the alignment, return best score: mainO 

* dna: values in Filch and Smith, Phi AS, 80 T 1382- 1386, J 983 

* pro: PAM 250 values 

5 * When scores are equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 
*/ 

nwo mv 
10 { 

char *px T *py; /* seqs and ptis */ 

int *ndely r *dely; /* keep track of dely */ 

in! ndelx, delx; /* keep track of delx */ 

inl *tmp; /* for swapping rowO, row J V 

15 int mis; /* score for each type */ 

int insO, ins J; /* insertion penalties */ 

register id; /* diagonal index */ 

register ij; /* jmp index */ 

register *co!0, *coll; /* scoie for curr, Jast row *l 

2 0 register xjt, yy; /* index into seqs */ 

dx = (struct diag ■*)g_cal!oc("to gel dings", Jen04lenl4 1, sizeof(s1ruct diag)); 

ndely = (int *)g callocC'to get ndely", lenl 4 I, sizeof(int)); 

2 5 dely = (int *)g ca)Ioc("to get dely", lenl 4 I, sizeof(int)); 

colO = (int *)g_calloc("to get coIO", lenl 4-1, sizeo r (inl)); 
coll = (int *)g calloc( "to get coll ", lenl 4 1 , sizeof(int)); 
msO = (dna)? DINSO : P1NS0; 
insl - (dna)? D1NS1 : PINS1; 

30 

smax - -10000; 
if (endgaps) { 

for (colO|0] = delylO] - -insO, yy = 1 ; yy < = lenl; yy4 4) { 
colO[yy) = delyjyy) = col0|yy-lj - insl; 

3 5 ndely [yy ] = yy; 

> 

co!0|0] = 0; /* Waterman Bull Malh Biol 84 */ 

} 

else 

4 0 for (yy - 1; yy < = lenl; yy+ 4) 

dely(yy) = -insO; 

/* fill in match matrix 
*/ 

4 5 for (px = seqx[0J, xx = 1; xx < = lenO; px44, xx4 4) { 

/* inilialize first entry in col 
*/ 

if (endgaps) { 

if (\x == I) 

50 col 1 10) = delx = (ins04 insl); 

else 

coll|0] = delx = colOIOJ - insl; 
ndelx — xx; 



} 

55 else{ 



col HO] = 0; 
delx — insO; 
ndelx = 0; 
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...mv 

for (py - seqx[l), yy - ]; yy < - lenl; py + + , yy++ ) { 
mis = col0[yy-lj; 
5 if (dna) 

mis += (xbm[*px -WJ&xbmrpy-'A'J)? DM AT : DM1S; 

else 

m i s + = _day|*px-'A*)[*py-'A*]; 

10 update penalty for del in x seq; 

* favor new del over ongong del 

* ignore MAXGAP if weighting endgaps 
*/ 

if (endgaps 1 1 nde1y|yy] < MAXGAP) { 
15 if (co)0[yy] - insO > = delyfyy]) { 

delylyy] = colOjyyl - (insO + insI); 
ndelyiyy] = 1; 

} else { 

delylyy] - = insl ; 

2 0 ndelyb7j++; 

} 

} else { 

if (col0|yy) - (insO+insl) > = dely[yy]) { 

dely(yy) = col0(yy] - (insO+insl); 

2 5 ndelyiyy) - ] ; 

} else 

ndelyiyy] + + ; 

> 

3 0 /* update penalty for del in y seq; 

+ favor new del over ongong del 
*/ 

if {endgaps || ndelx < MAXGAP) { 

if (colllyy-l) - insO > - delx) { 
35 delx = coll Iyy-1] - (insO+insl); 

ndelx — 1 ; 

} else { 

delx - = insl; 
ndelx + + ; 

40 } 

} else { 

if (colllyy-l] - (insO+insl) > - delx) { 

delx = col1|yy-]) - (insO+insl); 
ndelx = 1; 

4 5 } else 

ndelx + +; 

} 

/* pick the maximum score; we*re favoring 

5 0 * mis over any del and delx over dely 

*/ 



55 
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FIGURE 4F 

id = xx - yy + Jenl - 3; 
if (mis > = delx && mis > = delylyy)) 

co)l[yy] = mis; 
els* if (delx > = delylyy)) { 
colllyy] = delx; 
ij = dx|id] ijmp; 

if (dx[idj jp.nlOJ && (!dna | | (ndelx > - MAXJMP 
&& xx > dx(id].jp.x[*j] + MX) | | mis > dxlidJ.score + DlNSO)) { 
dx[id].ijmp+ + ; 
if(+ + ij >= MAXJMP){ 
wriiejmps(id); 
ij - dx | id), ijmp = 0; 

<- dxlid] offset - offset; 

ib offset += sizeoffstrucl jmp) + sizcof(offset); 

} 

} 

dx|id).jp.n|ij] - ndelx; 
2 0 . dx|id).jp.x|ij] = xx; 

dx | id], score = delx; 

kU colllyy] = delylyy]; 

L p 2 5 ' J = dxjidj ijmp, 

if (dxlidl.jp n|0) (!dna | | (ndelylyy) > = MAXJMP nilsisnn f 

= 7 xx > dMidl.jp.xIij]4MX) || mis > dx|id].score+D!NS0)) { 

\M dx|id].ijmp + 4 ; 

lf (+ +ij > = MAXJMP) { 

writejmps(id); 

ij = dxjid J. ijmp = 0; 

dx[ id], offset = offset; 

offset -f = sizeof(struct jmp) + sizeof(offset); 

} 

} 

dxlid) jp nlij] = -ndelylyy]; 
dx|id).jp.x[ij] = xx; 
dx|id). score = de)y(yy); 

,f ( xx = = jenO && yy < lenl) { 
/* last col 
*/ 

if (endgaps) 

45 colllyy) - = insO + insl*Oenl-yy); 

if (colllyy) > smax) { 

smax = coll[yy]; 
dmax — id; 

} 

50 ) 
} 

if (endgaps && xx < lenO) 

coIIlyy-I]-= insO + insl*(leiK)-xx); 
if (colllyy- 1) > smax) { 
55 smax = collfyy-1); 

dmax - id; 

} 

imp = colO; colO = coll; coil = tmp; 

} 

6 0 (void) free((char *)ndely); 

(void) free((char *)dely); 

(void) free((char *)coI0);(void) free((char *)coll);} ? a g e 4 of nW C 
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/* 

* printO — only routine visible outside this module 

5 

* static: 

* getmatO — trace back best path, count matches: printO 

* pr alignO - P""t alignment of described in array pQ: printO 

* dumpblockO dump a block of lines with numbers, stars: pr_align() 
10 * numsO - P ut oul a number line: dumpblockO 

* putline() - put out a line (name, |numj, seq, [num)): dumpblockO 

* st^rsO - put a line of stars: dumpblockO 

* stripnameO - strip any path and prefix from a seqname 
*/ 

15 

^include "nw.h" 
//define SPC 3 

^define P LINE 256 /* maximum output line */ 
2 0 ^define P SPC 3 /* space between name or num and scq */ 

exlern dayI26]|26J; 

int olen; /* set output line length V 



25 



FILE + fx; /* output Tile */ 



printO 
i 



print 



int Jx, ly, fir st gap, lastgap; / + overlap V 



3 0 if ((fx = fopen(ofile, "w")) = = 0) { 

fprinlf(stdeir,"%s: can't write %s\n"\ prog, oHle); 
cleanup(l); 

} 

fprintf(fx, "< first sequence: %s (length = %d)\n", namex[0}, lenO); 

3 5 fprintf(fx, " < second sequence: %s (length = %d)\rT, n;imex|l] T lenl); 

olen = 60; 
Ix - lenO; 
ly ~ lenl; 

firstgap = lastgap = 0; 

4 0 if (dmax < lenl - 1) { /* leading gap in x V 

pplOJ.spc = ftrstgap = lenl - dmax - I; 
ly -= pplOJ.spc; 

} 

else if (dmax > lenl - 1) { /* leading gap in y */ 
4 5 pp|l] spc = firstgap = dmax - (lenl - 1); 

Ix -= pp[ I J. spc; 

} 

if (dmaxO < lenO - I) { /* trailing gap in x */ 
lastgap = lenO - dmaxO -1 ; 
50 Ix -= lastgap; 

} 

else if (dmaxO > lenO -!){/* trailing gap in y V 
lastgap = dmaxO - (lenO - 1); 
ly -= lastgap; 

55 } 

getmat(Ix, ly, firstgap, lastgap); 
pralignO; 
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* irace back the best path, count matches 
static 

getmat(lx, ly, flrstgap, Iastgap) 
int Ix, ly; 

inl firstgap, Iastgap; 



getmat 



{ 



/* "core" (minus endgaps) */ 
/* leading trailing oveilap */ 



inl 

char 

double 

register 

register char 



nm, K), il, sizO, sizl; 

outx[32]; 

pet; 

nO, nl; 

*pO, *pl; 



/* gel total matcnes, stoic 
*/ 

,0 = il = sizO - sizl - 0; 

pO = seqx[0J + pp|l] spc; 

pi = seqx|l] + pplO].spc; 

nO = ppUJ-spc + 1; 

nl - pp[0).spc + ); 
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nm - 0; 

while (*p0&& + pl ){ 
if (sizO) { 

pi + + ; 

n!+ +; 

sizO--; 

} 

else if (sizl) { 

pO-f + ; 
n0 + +; 
sizl-; 



} 

else { 



if (xbm|*pO-'A , J&xbrnI*p]- 1 A , J) 

run + +; 
ir (n0 + -+ == pplOJ.xliO]) 

sizO = pplO].nliO++]; 
if (nl + + == pplH-xlil)) 

sizl = ppll]n[il + +]; 

p0+ + ; 
pi + +; 



} 



/* pet homology: 

* if penalizing endgaps, base is the shorter seq 

* else, knock off overhangs and take shorter core 
*/ 

if (endgaps) 

Ix = (lenO < lenl)? lenO : lenl; 

else 

Ix - (Ix < ly)? ix : ly; 
pet = 100.*(double)nm/(double)lx; 
fprintf(fx T "\n"); 

nprinif(fx T " < %d match%s in an overlap of %d: %.2f percent similarity^ 
nm, (run == 1)? *"* : "es\ Ix, pet); 
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fprinif(fx, " <gaps in first sequence: %d", gap*); 

if (gap*) { 

(void) sprintf(outx, (%d %s%s)", 

ngapx, (dna)? "base": "residue", (ngapx == ])? 
fprint f( fx, "%s", outx); 



..getmat 



fprimf(fx, gaps in second sequence: %d", gapy); 

if (gapy) { 

(void) sprintf(outx, " (%d %s%s)", 

ngapy, (dna)? "base": "residue", (ngapy == 1)? u ":"s"); 
fprintf(fx,"%s", outx); 

} 

if (dna) 

fpjjjiif(fA 

-\n< score: %d (match - %d T mismatch = %d, gap penalty = %d -f %d per base)\n* 
smax, DMAT, DMIS T D1NS0, D1NS1); 

else 

fprint f(fx, 

-\n< score: %d (Dayboff PAM 250 matrix, gap penalty - %d + %d per resrdue)\n", 
smax, PINS0, P1NS1); 
if (endgaps) 

fprint f( fx, 

"< endgaps penalized, left endgap: %d %s%s, right endgap: %d %s%s\n w , 
firstgap, (dna) 7 "base" : "residue", (firstgap - = l) ? : "s", 
lastgap, (dna)? "base" : "residue", (lastgap - - 1)? : "s"); 



else 



Static 
sialic 
static 
static 
static 
static 

static char 
static char 
static char 
static char 



fprint f( fx, " < endgaps not penal ized\n"); 



nm; /* matches in core - for checking V 

] m ax; / + lengths of stripped Hie names V 

jj|2J; /* jmp index for a path */ 

nc [2); /* number al start of current line V 

m |2]; /* current elem number - for gapping V 

siz|2i; 

*ps[2); /* ptr to current element */ 

*po|2]; /* ptr to next output char slot */ 

out [2] [P LINE]; /* output line */ 

star [P LINE); /* set by starsQ */ 
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/* 

* print alignment of described in struct path ppO 
*/ 

static 

pr_align() 
{ 



int 
int 

register 



nn; 

more; 

i; 



/* char count */ 



for(i= OJmax - 0; i < 2; { 
nn = stripname(namex(i]); 
if (nn > Imax) 

J max — nn; 



pr_align 



nc|i] = 1; 
ruli] = 1; 

60 sizli) = ij(i] = 0; 

psli] = seqx|i); 

pom = outm; Page3ofnwprint.c 
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for (nn - urn = 0, more = 1 ; more; ) { 

for (i = more = 0; i < 2; i+ +) { 
/* 

* do we have more of this sequence? 
*/ 

if(!*pslij) 

continue; 

more-f +; 



...pr align 
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if (pplij spc) { /* leading space */ 
*pofi] + + = ' *; 
pp|i].spc--; 



else if (siz[i)) { /* in a gap 4 / 
*po(iJ + + = 
siz[i) ; 



} 

else { 



/+ we're pulling a seq clement 
*/ 

*poli] = *ps|i); 
if (tslower(*ps[ij)) 

*ps|i) = iouppei(*ps|iJ); 

poIi]+ + ; 

ps|l}+4; 
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are we ai next gap for this seq? 
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} 



if <m|i] == pp[i].xlij[iJJ) { 

/* 

+ we need to merge all gaps 

* at this location 

*/ 

sizl«) = PPl>]n[ij|»)++); 
while (nj|ij == ppl»] x|ij(i]]) 

sizlij += pp[ij n[ij[»j++]; 

} 

ru[i]+ + ; 



} 

if (+ -f nn = = oleri | | !more && nn) { 
dumpblockO; 
for (i = 0; i < 2; i+ +) 
po|i] = out[i]; 

nn = 0; 

} 
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/* 

* dump a block of lines, including numbers, stars: pr alignO 
*/ 

static 

dumpblockO 

{ 

register i; 

for (i = 0; i < 2; i+ +) 
*po(i)- = '\0'; 



dumpblock 
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...dumpblock 
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(void) putc( r \r>', fx); 

for <i - 0; i < 2; { 

if(*oui[i] && CoutliJ != " 1 1 *(po[iJ) = * ')) { 
if (i = = 0) 

nums(i); 
if (i = = 0&& *oui[l]) 
slarsO; 

putline(i); 

if (i = = 0&& *oui[l)) 

fprintf(fx, star); 
if(i==l) 

nums(i); 



55 



} 
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/* 

* put out a number line: dumpblockO 
*/ 

static 

nums(ix) 



int 



/* index in ouiQ holding seq line */ 



nums 
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char 
register 
register char 



rdine|P_LINE]; 
*pn, *px, *py; 



for (pn = nline, i = 0; i < lmax+ P_SPC; H - + , pn + +) 
*pn = * *; 

for (i - nc|ix], py = out|ix]; *py; py+ -f T pn-f 4-) { 
if (*py == ' ' || *py == '-') 
*pn = ' '; 



else { 



if (i%10 = = 0 || (i = = J &&nc[ix] != 1)){ 
j = (i< 0)? -i : i; 
for (px = pn; j; j /= 10, px--) 
*px = j%10 + *0'; 

if (i < 0) 
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} 



} 

else 

i+-f ; 



*pn 



} 

*pn = '\0*; 
nc|ix] = i; 

for (pn = nline; *pn; pn + + ) 
(void) putc(*pn, fx); 
(void) pulc('\n\ fx); 



* put out a line (name, [num], seq, [num]): dumpblockO 
V 

static 



60 pulJine(ix) 



int 



putline 
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...putline 



int i; 
register char *px; 



for (px = namexlix], i = 0; *px && *px != pxH- + , 

(void) putc(*px, fx); 
for (; i < )max + P_SPC; 
10 (void) puicC \ fx); 

/* these count from 1 : 

* ruQ is current element (from 1) 

* ncl] is number at start of current line 

IS V 

for (px = outjix]; *px; px + +) 

(void) putc(*px&0x7F, fx); 
(void) putc('\n\ fx); 

} 

20 



/* 



put a line of stars (seqs always in oui|0], out[lJ): dumpblock() 



f ^ 2 5 static 
ll stars() 

S { 

[U 30 



stars 



in! i; 

register char *p0, *p 1 , cx T *px; 

if(!*outiO] || (*ouiI0J ==**&& *(po|0]) = ='*)! 
»*oui|J] || (*oui(l] = ="■&& *(po|!J) == * ')) 
return; 



px =" star; 

3 5 for (i - lmax + P_SPC; i; i- ) 

*px + + = ' 



for (pO - ou<[0J, pi - out[I); *p0 && *pl; p0+ + , pl + +) { 
if (isalpha(*p0) && isalpha(*pl)) { 



if (xbmPpO^A'J&xbm^pl-'A ]) { 

cx - 
nm+ + ; 

} 

4 5 else if (!dna && _dayl*pO-'A*J[*pI-'A , J > 0) 

cx = V; 

else 

cx — *; 

} 

50 else 

cx = ' 
*px+ + = cx; 

} 

*px++ = 'W; 
55 *px = '\(T; 
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/* 

* strip path or prefix from pn, return Jen: pr alignQ 
*/ 

static 

stnpname(pn) Slripname 
char *pn; /* file name (may be path) */ 

{ 

register char *px, *py; 



P y = 0; 

for (px = pn; *px; px-f +) 
if Cpx " T) 

py - px + I; 

15 if(py) 

(void) sucpy(pn, py); 
. return(strlen{pn)); 
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* cleanupQ - cleanup any tmp file 

* getseq() - read in seq, set dna, len, maxlen 

* g_caIIoc() -- ca)loc() with error checkin 

* readjmpsO -- get ihe good jmps, from tmp file if necessary 

* wiitejmpsO -- write a filled array of jmps to a tmp file: nw() 
*/ 

^include "nw.h" 
include <sys/file.h> 



char *jname — "/tmp/homgXXXXXX* 

file *0; 

int cleanup(); 

long iseek(), 



/* tmp file for jmps */ 
/* cleanup tmp file */ 
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/* 

* remove any tmp file if we blow 
*/ 

cleanup(i) 
{ 



int 

»f(fj) 

exit(i); 



(void) unl ink (j name); 



cleanup 
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* read, return pir to seq, set dna, Ien T maxlen 

* skip lines starting with *;','<' r or *> ' 

* seq in upper or lower case 
*/ 

char * 
getseq(fde > Jen) 

char * File; 

int *Jen; 



{ 



char 

register char 

int 

FILE 



/* File name */ 
/* seq Jen */ 

line|1024J, *pseq; 

nalgc, tlen; 



getseq 



if ((fp - fopen(fi!e, w r)) = = 0) { 

fprintf(stderr,"%s: can't read %s\rT, prog, File); 
exit(l); 

} 

tlen = natgc = 0; 

while (fgets(line, 1024, fp)) { 

if (*Fine = = Y I | *Iine == T <* || *!ine = = *>') 

continue; 
for (px = line; *px != '\n'; px++) 

if (isupper(*px) 1 1 isJower(*px)) 
tlen + +; 

} 

if ((pseq = malloc((unsigned)(ilen + 6))) = = 0) { 

fprintf(siderr T "%s: mallocO failed to get %d bytes for %sW\ prog, tlen + 6, file); 
exit(l); 

} 



pseqIO} = pseq[l] = pseq[2) = pseq(3J = '\0'; 



Page 1 of nwsubr.c 



25 



FIGURE 4Q 



...getseq 



py = pseq + 4; 
*}en = lien; 
5 rewind(fp); 

while (fgets(Ime, 1024, fp)) { 

if (*] jne = = •;■ || *!ine = = ' < * || *Hne = ='>*) 
continue; 

10 for (px = line; *px != *\n*; px++) { 

if (isupper(*px)) 

*py + + = *px; 
else if (islower(*px)) 

*py+-f = toupper(*px); 
15 if (index(" ATGCUV(py- 1))) 

n3lgc+ + ; 

} 

} 

*py + 4 = 
2 0 "py - *\0*; 

(void) fclose(fp); 

dna = naigc > (lien/3); 

return(pseq-M); 



char * 

g_ea)Joc(msg T ax, sz) gjralloc 
char "*msg; /* program, catling routine */ 

inl ax, sz; /* number and size of elements *i 



30 { 



char *px, *calloc(); 



if ((px = calloc((unsigned)nx, (unsigned)sz)) = = 0) { 
if <*msg){ 

3 5 fprimf(sideir, " %s: g callocO faded %s (n= %d, sz- %d)\n", prog, msg T nx, sz); 

exit(I); 

} 

} 

relurn(px); 

40 } 
/* 

* gel final jmps from dxQ or Imp file, set ppfj, reset dmax: mainO 
*/ 

4 5 readjmps() readjmps 

{ 

int fd = 1; 

in! siz, iO, il ; 

register i, j, xx; 

50 

(void) fclose(fj); 

if ((fd = openOname, ORDONLY, 0)) < 0) { 

fprint^stderr, "%s: can*t openO %s\n T \ prog, jname); 
55 cleanup(l); 

} 

} 

for (i = iO = il = 0, dmaxO = dmax, xx = lenO; ; i + + ) { 
while ()){ 

60 for <j = dxldmaxj.ijmp; j > = 0 && dx[dmax).jp.x[j] > = xx; j-) 

; Page 2 of mvsubr.c 
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...readjmps 



if (j < 0 && dx[dmax]. offset && ij) { 

(void) lseek(fd, dxjdroax], offset, 0); 
5 (void) read(fd T (char *)&dx[dm3x].jp r size of (st r uct jmp)); 

(void) read(fd, (char *)&dx[dmax]. offset, sizeof(dx(dmax). offset)); 
dx|dmax].rjmp = MAXJMP-1; 

} 

else 

1 Q break; 
} 

if (i > = JMPS) { 



fprintf(stdeir, u %s: too many gaps in alignmenAn", prog); 
cleanup(J); 



15 } 



siz = dxjdmaxj jp.n[jj; 
xx = dxjdmaxj jp.x[j); 
dmax 4- = siz; 

2 0 if (siz < 0) { /* gap in second seq */ 

pp|]} n(il) - -siz; 
f = s t / '> 

%}J /* id = xx - yy + ten! - I 

tQ 25 V 

f—z PPlH = xx dmax 4 lenl - J; 

7"™" /* ignore MAXGAP when doing endgaps V 

M 3 0 siz - (-siz < MAXGAP | | endgaps)? -siz : MAXGAP; 

f U 1)4 f; 

) 

else if (siz > 0) { /* gap in first seq V 
**i pp|0].n|i0] - siz; 

3 5 PP[0].x(K>] - xx; 
H gapx4-f; 

[3 ngapx + = siz; 

I j I* ignore MAXGAP when doing endgaps V 

12 siz - (siz < MAXGAP | | endgaps)? siz : MAXGAP; 

" " 4 0 »0+ + ; 

} 

} 

else 

break; 

45 } 

/* reverse the order of jmps 
*/ 

for 0 = 0, )0-; j < iO; j + + , iO--) { 

5 0 i — ppIOJ.nljJ; ppl0].n[j] = ppI0].n|i0J; pp!0].n[i0] - K 

i = pplOJ.xlj]; pp|0].x[jj - pp[0).x[K)]; pp[0].x[i0] - i; 

> 

Tor (j = <>, il — ; j < il; j+ + , il-) { 

i = pp[lJ.nUJ; pp|l]-nU] = PPH] n[ilj; pp|1].n|ilj = i; 
55 i = ppll].x[j]; ppDJ.xUJ = PPlU Mil]; ppllj-xiil] - i; 

} 

if (fd > = 0) 

(void) close(fd); 

if(0){ 

6 0 (void) unJ ink (j name); 

0 = 0; 

offset = 0;}} Page 3 of nwsubr.c 
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■/* 

* write a filled jmp struct offset of the prev one (if any): nw() 
*/ 

vvritejmps(ix) writejmps 
int ix; 

{ 

char *mkiemp0; 
if (!fj){ 



if (mkiemp(jname) < 0) { 

fpnntf(stderT, "%s: can*! mkiempO %sArT, prog, jname); 
c!eanup( I); 

15 } 

■ r / , r- r 1 : «„."\-\ C\\ / 

II \\\) — lt,'ptiHjii<f nit , w // \ 

rprimf(siderr, " %s: can't wnie %s\n", prog, jname); 
exit(l); 

} 

20 } 

(void) fwnte((ehar *)&dx|ix].jp, sizeof(s1ruet jmp), I, fj); 
(void) hvnte((char *)&dx|ix) offset, sizeof(dx|»>] offset), I, fj); 



25 



Page 4 of nwsubr.c 



FIGURE 5 



5 

GTGCTCTCCGAGGACAAGCAGGAGGNGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTG 
GAAGTGTGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATG 
CGCTCACTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGAC 
TTGAGTCCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGC 

1 0 TTCAGTGCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTC 
TTCCTGGGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAAr 
CTCCTGCTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCT 
GTGATCCTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCA 
CAGCTGACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTG 

1 5 CTGGTGGGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATC 
CACCTTGGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGC 
TACTACACGTACCGAA 



m 
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CACAACCAGCCACCCCTCTAGGATCCCAGCCCAGCTGGTGCTGGGCTCAGAGGAGAAGGC 
5 CCCGTGTTGGGAGCACCCTGCTTGCCTGGAGGGACAAGTTTCCGGGAGAGATCAATAAAG 
GAAAGGAAAGAGACAAGGAAGGGAGAGGTCAGGAGAGCGCTTGATTGGAGGAGAAGGGCC* 
AGAGAATGTCGTCCCAGCCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACT 
CCTATGGCAGCTGGTACATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGG 
AAGTGCCCTCCTGCCACACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGC 

10 TGTCAATCCTTGTGCTGCTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTG 
ACTGTGTGCGTGGCAGGCCCGGCCTGCCCAGGCCCCGGGCAGTGCCTGCTGCTGTTTTCA 
TGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTGCCCTTCCTGA 
CTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGGGCCTGGAAGA 
TACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTGTGGCTGCCTGTGCCACGGCTG 

1 5 GCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTTGGGGTCCAGG 
TCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTACTCCCTGCTGG 
CCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCTGTGCAGCTGG 
TGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGCAGCTACTCTG 
AGGAATATCTGAGGAACCTCCTTTGGAGGAAGAAGCTGGGAAGCAGCTACCACACCTCCA 

2 0 AGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTACAGTCCAGAGC 
CAGGATTCCATCTCCCGGTG7VAGCTGGTGCTTTCAGCTACACTGACAGGGACGGCCATTT 
ACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAGGTGAGGGCAG 
GGGTCACCACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTCTCCGAGGACA 
AGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTGTGCTACATCT 

2 5 CAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCACTGGTGACAC 

ACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGTCCCTTGCATC 
GGAGTCCCCATCCCTCCCGCGAAGCCATATTCTGTTGGATGAGCTTCAGTGCCTACCAGA 
CAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATGATCTTCTTCCTGGGAACCACGG 
CCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTGCTCTTCCGTT 

3 0 CCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATCCTGCAGAACA 

TGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTGACCAACCGGC 
GAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTGGGTGCCATAG 
TGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATCCACCTTGGCCAGATGG 
ACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTACACGTACCGAA 

3 5 ACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTCTGCTCCCTGC 

TCCTGC7\AGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGACAGCCTCAGAC 
CAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATGGCCAAGGGAG 
CTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACGCTGCTGCACA 
ACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGTGCCCAGCCCT 

4 0 GAGGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCCTGCCTACCAC 

CTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCAGCAGGTCCTCC 
GGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAGGGCTCTGCTCC 
ACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAGAAACTGGTGGGTTAGGG 
CCTTGGTCCAGGAGCCAGTTGAGCCAGGGGAGCCACATCCAGGCGTCTCCCTACCCTGGC 

4 5 TCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACTCCAGCCCAGCT 
CCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCTCACCCCCTCAG 
CGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGGCCTCTGGCCTGCAGGGCAG 
CCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGAGAGCCAGATAT 
TTTTGTAGTTTTTATGCCTTTGGCTATTATG7W\GAGGTTAGTGTGTTCCCTGCAATAAA 

50 CTTGTTCCTGAGAAAAA 



FIGURE 7 

MSSQPAGNQTSPGATEDYSYGSWYI DEPQGGEELQPEGEVPSCHTSI PPGLYHACLASL 
SILVLLLLAMLVRRRQLWPDCVRGRPGLPRPRAVPAAVFMVLLSSLCLLLPDEDALPFL 
5 TLASAPSQDGKTEAPRGAWK1 LGLFYYAALYY PLAACATAGHTAAHLLGSTLSWAHLGV 
QVWQRAECPQVPKIYKYYSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGSKGLQSS 
YSEEYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCI YTPQPGFHLPLKLVLSATLTG 
TA I YQVALLLLVGVVPTIQKVRAGVTTDVSYLLAGFGI VLSEDKQEVVELVKHHLWALE 
VCYI SALVLSCLLTFLVLMRSLVTHRTNLRALHRGAALDLSPLHRSPH PSRQAI FCWMS 
10 FSAYQTAFICLGLLVQQI I FFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLAL 
AVI LQNMAAHWV FLETH DGH PQLTNRRVLY AATFLLFPLN VLVGAI VATWRVLLSALYN 
AI HLGQMDLSLLPPRAATLDPGYYTYRNFLKI EVSQSHPAMTAFCSLLLQAQSLLPRTM 
AAPQDSLRPGEEDEGMQLLQTKDSMAKGARPGASRGRARWGLAYTLLHNPTLQVFRKTA 
LLGANGAQP 
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Important features of the protein: 

Signal peptide: 
none 

Transmembrane domain: 
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Motif name: 
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Motif name: Prokaryotic membrane lipoprotein lipid attachment 
50 site- 

355-366 

Motif name: ATP/GTP-binding site motif A (P-loop). 
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123-131 



* 



\ 



if) 
CD 



CD 



C. 
CD 
T3 



O 
CO 



CO 
CM 
LO 



0) 

to 

O 



E 

in 

sz 
CL 

o 

E 
>-» 

o 
n 

< 

CD 



Q 
CD 
^ < 

CD 

CO 0 
OO ^ 

0_ 
CO 



1^ 

CD 
CO 



CM 



M 



CO 
CO 



OO 
LO 



to -£ 

CO 0) 
CO CD 
13 



03 co 

E £2 



cz CO 
nJ oo 
c co 
^ co 
3 ^ 



FIGURE 1 0 
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FIGURE 12B 
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FIGURE 17 
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Figure 23 
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Figure 25 



Figure 25 A 

Figure 25 B 
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Figure 26 
Stra6 / GAPDH 
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